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CASE-BASED ORGANIZING AND QUERYING OF A DATABASE 

1. yield of the Invention 

This invention relates to case-based organizing and 
(juerying of a database . 

2. Deacriptio n of Related Art 

As storage capability grows for computing devices, many 
databases have become larger, and large databases have become 
more common. One problem which has become apparent in the art is 
the difficulty of retrieving information from large databases 
when the location of that desired information is not already 
known. For example, a search for information in a large library 
may be hampered by the size of the library, because of the large 
number of items which must be examined. This can be exacerbated 
if the information searched for is not well -described by the 
searcher, if the searcher is unfamiliar with that subject matter, 
or if the information searched for is not well indexed. 

Large databases of objects may sometimes be generated 
without the original intent to organize them into a database. 
For example, newspaper articles may generally be written without 
the consideration that they may be collected into a single 
database for later search. When they eventually are collected 
into a database, the effort required to orgemize those objects 
into a database for information retrieval can be formidable . It 
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would be advantageous to provide a system in which a large amount 
of information may be collected into a database without having to 
expend a comparable amount of effort on organization and 
indexing, e.g., where such organization and indexing can be done 
by an automated process. 

Prior art methods of retrieving information generally 
require preparation of a query, in which objects to be searched 
for are described in some formal manner. This imposes additional 
effort on the searcher, and generally also requires that the 
searcher be familiar with the subject matter to be searched, with 
the organization and indexing of the database, and with a formal 
query language. Accordingly, it would be advantageous for the 
searcher to be able to describe the query in a natural and 
relatively informal or unstructured manner, such as a description 
in a natural language. 

Work with case-based systems has shown that incremental 
refinement of problem descriptions can be valuable in improving a, 
automated system's recall (ability to retrieve objects which are 
related to the query) and precision (ability to rule out objects 
which are not related to the query) . It would be advantageous to 
be able to incrementally refine the query after a response. But 
when the query itself is unstructured, the original response may 
provide so much information that valuable material is lost in the 
size of the response. Accordingly, it would be advantageous to 
provide suggestions for incremental refinement. In one aspect of 
the invention, the response may be organized by quality of match. 
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In another aspect, the response may be organized into clusters of 
related objects. 

SUMMARY OF THE INVENTION 

The invention provides a system for case-based 
organizing and querying of a database. The database may comprise 
a set of objects, such as a set of documents including text. In 
a preferred embodiment, the database may be organized by 
examining each object and associating that object with a set of 
property values, such as (in the case of text documents) a set of 
keywords or other indicators of content. For example, a document 
may be associated with those words which appear more frequently 
in the document than in the database at large, or which appear in 
early text of the document, or which appear in a title. The 
system may be responsive to a query by associating the query with 
a similar set of property values and performing case-based 
matching or other fuzzy associative matching on the objects of 
the database for objects which are similar. In a preferred 
embodiment, the query may be natural -language text and may be 
associated with keywords or other indicators of its content. 

In a preferred embodiment, the system may present 
matched objects in response to the query, may respond to 
iterative refinement of the query (in similar manner to iterative 
case-based methods shown in those co-pending applications which 
have been incorporated by reference) , and may order matched 
objects by quality of match. The system may also examine the 
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collection of matched objects and organize them for presentation; 
for example, the system may group matched objects into clusters 
of objects which have similar properties, which relate to similar 
content, or which have similar likelihood to be of relevance to 
the query or of interest to an operator posing the query. The 
system may respond to the result of organizing matched objects 
for presentation with suggestions for iterative refinement of the 
query. 

The system may therefore be capable of producing 
improved recall and precision over prior art techniques. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a block diagram of a datcQjase explorer 
and filter system. 

Figure 2 shows a data flow diagram of a method of 
filtering documents. 

Figure 3 shows a data flow diagram of a method of 
processing queries. 

Figure 4 shows a data flow diagram of a method of 
processing hit tables. 

Figure 5 shows a process flow diagram of a method of 
clustering hit tables. 
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Figure 6 shows an example explorer user interface 
screen as viewed by an operator. 

Figure 7 shows a second example explorer user interface 
screen, as viewed by an operator, in which clusters are 
displayed. 

Figure 8 shows an example explorer user interface 
screen, as viewed by an operator, in which settings may be set by 
the operator. 

Appendix A shows a table of parts of speech and a set 
of lexical rules for the English language, which may be used for 
the tag-and-segment-text process or the tag- and- segment -text 
process in a preferred embodiment . 

Appendix B shows an output of a test run of an example 
filter when applied to a portion of an example multimedia 
encyclopedia used as a database, available as "Microsoft Encarta" 
from Microsoft Corporation of Redmond, Washington. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

An embodiment of this invention may be used together 
with inventions which are disclosed in a copending application 
titled "AUTONOMOUS LEARNING AND REASONING AGENT", application 
Serial No. 07/ 869,926, filed April 15, 1992 in the name of 
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Bradley P. Allen, hereby incorporated by reference as i£ fully 
set forth herein. 

In a preferred embodiment, the invention may operate in 
conjunction with a computing system, including a processor and a 
memory, generally configured as is well knovm in the art; the 
memory may include primary memory for stored programs and for 
data and secondary memory for extensive storage of large numbers 
of objects. Preferably, the memory may comprise a sizable 
database of objects, as is well known in the art of databases, 
and such objects may comprise various types of computing and 
data-storage structures. However, no particular structure is 
required for the database itself; the database may be a 
relational database, an unstructured collection of objects, or 
some other database format. 

Although the invention is disclosed herein primarily 
with respect to textual objects, it would be clear to those of 
ordinary skill in the art, after perusal of the application, that 
extension of the concepts disclosed to other types of objects is 
within the scope and spirit of the invention, and would not 
requite undue experimentation. Such other types of objects may 
include source code, object code, binary values, numeric values, 
text or other symbolic values, representations of sound and/or 
picture signals or other signals, multimedia, data structures for 
rule-based or case-based systems, artificial neural networks, 
linked data structures such as linked lists, mathematical 
structures such as equations, polynomials, matrices or tensors. 
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and other data types knovm in at least one of the many fields of 
computing. Although when the invention is applied to textual 
objects, appearance of a text string in an object is considered 
pertinent, when the invention is applied to other types of 
objects, other measures of cloaeness or pertinence, such as 
numerical closeness, would be workable, and are within the scope 
and spirit of the invention. 

FILTER AND EXPLORER SYSTEM 

Figure 1 shows a block diagram of a database explorer 
and filter system. 

In a preferred embodiment, a system 101 for case-based 
organizing and querying of a database 102 may comprise a filter 
103, for organizing the database 102 so as to be responsive to a 
query 104, an explorer 105, for selecting a set of objects 106 in 
the database 102 which are responsive to that query 104, and an 
object file system 107, for accessing the database 102. In a 
preiEerred embodiment, the database 102 may generally be of a type 
which is known in the art, such as a collection of text objects 
supported by Cairo Milestone 4 running under the Windows NT 
system version 297, available from Microsoft Corporation of 
Redmond, Washington, and may be accessed in conjunction with the 
object file system 107 of that product. 

The filter 103 may operate at an initialization time, 
such as when the processor is first started or before the first 
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query 104 is presented to the explorer 105. The filter 103 may 
also operate in an incremental mode, e.g., by updating its 
organization of the database 102 periodically, such aa upon the 
passage of a fixed period of time, when a fixed number of objects 
106 are changed or added to the database 102, when the operation 
of the explorer 105 is degraded below some predetermined level, 
when triggered by an operator 108 in conjunction with a user 
interface 109 (e.g., when a query is presented, by a specific 
command to do so, or as a side effect of another operation) , or 
otherwise as determined by the database 102 or an external 
manager. 

The filter 103 may examine each of the objects 106 (or 
some predetermined subset of objects 106) in the database 102 and 
associate each object 106 it examines (or some predetermined 
subset of those objects 106) with a set of properties. For a 
textual database 102 as primarily described herein, those 
properties may be keywords or phrases which are found in the 
object 106, but may also comprise other property values, such as 
the language the text is written in, the length of the text, or 
the reading level or other measure associated with the text 
(including measures of complexity, detail, redundancy, writing 
style, "fog", or other Icnown measures of text, e.g., known in the 
art of grammar checking and correction) , 

The objects 106 with their properties may be treated as 
a set of cases to be matched by a CBR engine 110 (operating with 
the object file system 107) with a test case generated from the 
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query 104. Each case may generally comprise an object 105 plus 
the properties that object 106 was associated with, e.g., key 
words and phrases found in that object. In a preferred 
embodiment, these properties may include a lexicon of words and 
noun phrases found in the object 106, including at least some of 
these words labelled as a set of "header words" or "relevant 
words" . 

The explorer 105 may generally operate at a question 
time, such as when one or more queries 104 is presented to the 
explorer 105. In a preferred embodiment, the explorer 105 may be 
invoked by the operator 108 in conjunction with the user 
interface 109, which user interface 109 may allow the operator to 
trigger operation of the explorer 105 and to present one or more 
queries 104 to the explorer 105. In a preferred embodiment, the 
user interface 109 may be one such as the user interface 
presented by the Windows NT system referred to herein. In a 
preferred embodiment, the operator 108 may be a human being, but 
those of ordinary skill with recognize, after perusal of the 
application, that the operator 108 may comprise a network 
connection, an external management program, or an AI program. 

In a preferred embodiment, the explorer 105 may 
generate a response 111 including a set of matching cases (i.e., 
objects 106 with their properties) , which may be presented to the 
operator 108 by means of the user interface 109, such as the user 
interface presented by the Windows NT system referred to herein, 
augmented by features described herein. 
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The filter 103 and the explorer 105 may operate in 
conjunction with the object file system 107 (and in particular 
the CBR engine 110 thereof) , which may respond to a set of 
properties formed into a vector query 112 directed at the 
database 102, and may return a hit table 113 of those objects 106 
in the database 102 which have the indicated properties. In a 
preferred embodiment, the CBR engine 110 may use case-based 
matching and other techniques such as those shown in those co- 
pending applications which have been incorporated by reference. 

FILTERING DOCUMENTS 

Figure 2 shows a data flow diagram of a method of 
filtering documents. 

In a preferred embodiment, a document 201 (an object 
106 which comprises text, such as a pure text document or a text 
document formatted for a word-processing program) may be input to 
the filter 103 for examination. The filter 103 may process the 
text by a tag-and-segment-text process 202, which may lexically 
analyze the document 201, e.g., by means of a Icnown lexical 
analysis technique . 

The tag-and-segment-text process 202 may extract a set 
of single terms 203 and generate a set of header words 204 found 
in the document 201. The header words 204 may comprise those 
words which occur in an initial part of the object 106, or in a 
title, subject line, topical paragraph, or abstract. In a 
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preferred embodiment, the header words 204 may comprise the first 
three things mentioned in the document 201. 

The tag-and-segment-text process 202 may also tag words 
in the document 201 with their parts of speech and parse them 
into a set of sentences 205. The sentences 205 may be input to 
an extract -noun-phrases process 206, which may further lexically 
analyze the document 201, e.g., by means of a known lexical 
analysis technique, to extract a set of noun phrases 207 and 
generate a lexicon 208 thereof. In a preferred embodiment, the 
tag-and-segment-text process 202 may use a grammar of the English 
language, but other natural languages, and even formal 
specification languages such as programming languages, would also 
be suitable. 

The tag-and-segment-text process 202 may also recognize 
and generate a set of proper nouns 209. In a preferred 
embodiment, the set of proper nouns 209 may be determined by 
known rules, e.g., that proper nouns generally comprise strings 
of words each starting with an upper-case letter, or by reference 
to a dictionary of known proper names. The set of proper nouns 
209 may be input, along with at least some of the single terms 
203, to a determine-relevant -words process 210, which may extract 
a set of relevant words 211. 

The set of relevant words 211 may be determined with 
reference to the frequency of those words in the object 106 (with 
respect to the entire text found in the object 106) and with 
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reference to the frequency of those words in the database 102, 
with respect to the text corpus of the database 102 . In a 
preferred embodiment, the ratio for each word (frequency in the 
object 106) divided by (frequency in the database 102) may be 
computed, and the set of relevant words 211 may comprise those 
words whose relative frequency exceeds a threshold, e.g., a 
predetermined threshold such as a 1:1 ratio. However, it would 
be clear to those of ordinary skill, after perusal of this 
application, that other measures (e.g., statistical measures) 
relating to frequency could be used to determine relevant words, 
such as clustering of relevant words in paragraphs, correlation 
with other relevant words, or relative frequency of word pairs or 
n- tuples, and that such other measures are within the scope and 
spirit of the invention. 

The filter 103 is described herein for a specific set 
of properties of the text which may be extracted. However, it 
would be clear to those of ordinary sJcill, after perusal of this 
application, that extraction of other properties could be readily 
accomplished, and is within the scope and spirit of the 
invention. Such other properties could include the language the 
text is written in (or for English- language text, the number of 
foreign words used), the length of the text, or the reading level 
or other measure associated with the text (including measures of 
complexity, detail, redundancy, writing style, "fog", or other 
known measures of text, e.g., known in the art of grammar 
checking and correction) . 
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In a preferred embodiment, the extract-noun-phrases 
process 206 and the determine -relevant -words process 211 may 
proceed in parallel, e.g., by execution on multiple processors or 
by multiple tasks or threads in a multitasking or multithreaded 
environment . 

The filter 103 may mark each object 106 with the 
properties it determines (or alternatively may create a separate 
object 106 relating each documentary object 106 to its 
properties) , so that the object 106 and its properties may be 
treated as a case in a case-base. In a preferred embodiment, the 
set of cases may be matched to a test case by a CBR engine 110, 
using techniques like those described in copending applications 
(1) Serial No. 07/ 664,561, filed March 4, 1991 in the name of 
inventors Bradley P. Allen and S. Daniel Lee, titled "CASE-BASED 
REASONING SYSTEM"; (2) Serial No. 07/ 869,935, filed April 15, 
1992 in the name of inventor Bradley P. Allan, titled "MACHINE 
LEARNING WITH A RELATIONAL DATABASE"; and (3) Serial No. 07/ 
869,926, filed April 15, 1992 in the name of Bradley P. Allen, 
titled "AUTONOMOUS LEARNING AND REASONING AGENT"; each of which 
is hereby incorporated by reference as if fully set forth herein, 
or other case-based reasoning techniques which may be known in 
the art. 

PROCESSING QUERIES 

Figure 3 shows a data flow diagram of a method of 
processing queries. 



13 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 



95/02221 



In a preferred embodiment, the query 104 > entered in 
free text by the operator 108, may be input to the explorer 105 
for examination. The explorer 105 may process the text by a tag- 
and- segment -text process 301, which may lexically analyze the 
document 201, e.g., by means of a known lexical analysis 
technique, similarly to the tag- and- segment -text process 202 of 
the filter 103. 

The tag-and-segment-text process 301 may extract a set 
of single terms 302, similarly to the tag-and-segment-text 
process 202 and the set of single terms 203 of the filter 103. 

The tag-and-segment-text process 301 may also tag words 
in the document 201 with their parts of speech and parse them 
into a set of sentences 303, similarly to the tag-and-segment- 
text process 202 and the sentences 205 of the filter 103. The 
sentences 303 may be input to an extract -noun-phrases process 
304, which may further lexically analyze the document 201, e.g., 
by means of a known lexical analysis technique, to extract a set 
of noun phrases 305, similarly to the extract-noun-phrases 
process 206 and the noun phrases 207 of the filter 103. 

The tag-and-segment-text process 301 may also recognize 
and generate a set of proper nouns 306, similarly to the tag-and- 
segment-text process 202 and the proper nouns 209 of the filter 
103. 
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The noun phrases 305, single terms 302, and proper 
nouns 306, a rank threshold 307, and a set of selected subtopics 
308 (subtopics selected by the operator 108 to refine the query 
104) may be input to a generate -query process 309, which may 
generate a set of query terms 310 and a query parse tree 311. 

In a preferred embodiment, the tag-and-segment-text 
process 301, the extract -noun-phrases process 304, and the 
generate-query process 309 may proceed as asynchronously as 
possible, e.g., by execution on multiple processors or by 
multiple tasks or threads in a multitasking or multithreaded 
environment . 

The query terms 310 and the query parse tree 311 may be 
input to the CBR engine 110 in the object file system 107, and 
may perform case-based matching or other fuzzy associative 
matching on the objects 106 in the database 102 for objects which 
are similar to the query 104, as described by the query terms 310 
and the query parse tree 311, and which have a match quality at 
least as good as the rank threshold 307. (As noted with regard 
to the user interface 109, the selected subtopics 308 are added 
to the text of the query 104.) The object file system 107 may 
generate the hit table 113 of matched objects 106. 

PROCESSING HIT TABLES 

Figure 4 shows a data flow diagram of a method of 
processing hit tables. 
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The hit table 113 and the relevant words 211 may be 
input to a cluster hits process 401, which (if clustering is 
enabled) collects the matched objects 106 into clusters, and may 
output a set of clusters 402 in response. Each cluster 402 may 
comprise a set of objects 106, selected for collective closeness 
with regard to all objects 106 in the hit table 113. The cluster 
hits process 401 is further described with regard to figure 5. 

The hit table 113, the relevant words 211, and the 
lexicon 208 may be input to a first generate -topics (from 
relevant words) process 403, while the lexicon 208 and the query 
terms 310 may be input to a second generate -topics (from query 
words) process 403. Together the two generate-topics processes 
403 may output a set of topics 404 and subtopics 405. 

In a preferred embodiment, the generate-topics process 
403 may examine the lexicon 208 of noun phrases 207 with a rule- 
based inference engine (not shown) . (One such inference engine 
is the ART-IM system, available from Inference Corporation in El 
Segundo, California.) The inference engine may detect particular 
patterns in the noun phrases 207 which indicate semantic 
relations between the words in those noun phrases 207. For 
example, the noun phrase 

"kangaroos, wallabies, and other marsupials" 

would be detected and would generate the relations 
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kangaroo IS-A marsupial 
wallaby IS-A marsupial 

The generate- topics process 403 may thus construct a 
phrase lattice,, showing each noun phrase 207 as being inclusive 
of (above) , included in (below) , or incommensurate with (neither 
above nor below) each other noun phrase 207. 

The generate -topics (from relevant words) process 403 
may restrict the phrase lattice to those noun phrases 207 which 
include relevant words 211 of the objects 106 in the hit teible 
113. In a preferred embodiment, the second generate-topics (from 
query words) process 403 may operate in similar manner as the 
first generate-topics (from relevant words) process 403 and may 
restrict the phrase lattice to those noun phrases 305 which 
include relevant words 211 of the query. 

Figure 5 shows a process flow diagram of a method of 
clustering hit tables. 

The cluster hits process 401 may operate by means of a 
genetic algorithm, in which an initial configuration and a set of 
genetic operators are specified, and the set of solutions is 
formed by simulation of random "evolution" of a population of 
possible solutions, using the method of steady-state reproduction 
without duplicates. Genetic algorithms are well known in the 
art, and are described in further detail in "Foundations of 
Genetic Algorithms", ed. Gregory J.E. Rawlins (Morgan Kaufmann 
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Publishers: San Mateo, California 1991). It would be clear to 
those of ordinary skill in the art that the parameters of the 
genetic algorithm, and even the type of genetic algorithm 
performed could be varied substantially and still remain within 
the scope and spirit of the invention. 

In a cluster-count step 501, a number of clusters 402 
is selected. The number of clusters 402 may vary from a known 
minimum to a known maximum, settable by the operator 108. The 
genetic algorithm of the following steps is repeated for each 
permissible number of clusters 402, and the best solution 
adopted. 

In an initiate-clusters step 502, a set of possible 
clusters 402 is selected; this is a single "gene". A random 
population of genes is selected^ Each cluster 402 is represented 
by the centroid of the objects 106 which would comprise that 
cluster 402. Thus, when a solution of clusters 402 is selected, 
each object 106 is assigned to the cluster 402 which it best 
matches . 

After the initiate-clusters step 502, the genetic 
algorithm of the following steps is repeated for a known period 
of time, settable by the operator 108. When that time expires, 
the best available solution (i.e., the gene with the best 
quality) is selected as the solution and specifies the set of 
clusters 402. Each object 106 is assigned to the cluster 402 to 
which it is the closest. 
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In an evaluation step 503, all genes in the population 
are evaluated for quality, and the gene with the least quality is 
removed. In a preferred embodiment, the statistical measure 
"category utility" is computed; i.e., the utility of each cluster 
402 in distinguishing between an object 106 in one cluster 402 
from an object in another cluster 402. Thus, if the centroid of 
a cluster 402 has high quality of match for several objects 106, 
those objects are reasonably clustered together. 

Although in a preferred embodiment, matching for 
clusters 402 is performed using relevant words 211, it would be 
clear to those of ordinary skill, after perusal of this 
application, that other properties of the objects 106 could be 
used as well, such as the read/write date of the object 106, and 
that doing so would be within the scope and spirit of the 
invention. 

In a genetic-operator step 504, one of three operators 
is selected and employed to create a new genet (1) Mutation-1. 
The new gene is randomly created. (2) Mutation-2. An existing 
gene is copied, except that one of its clusters 402 is mutated by 
replacing it with a randomly created cluster 402. (3) Crossover. 
Two genes have their n- tuples of clusters 402 paired off and one 
cluster 402 is selected at random from each pair to form the new 
gene. Alternatively, a new gene is created by selecting N 
clusters 402 at random from the 2N clusters 402 specified by the 
two old genes . 
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USER INTERFACE 

Figure 6 shows an example explorer user interface 
screen as viewed by an operator. While the invention is 
described primarily with regard to a specific user interface, it 
would be clear to those of ordinary skill in the art that another 
user interface of equal or greater flexibility would be suitable, 
and would be within the scope and spirit of the invention. 

In a preferred embodiment, the user interface 109 may 
be combined with a user interface for a generalized file system 
exploration program, such as in the Windows NT system referred to 
herein. The user interface 109 may comprise a query window 601 
in which the operator may enter the query 104 in free text, and a 
results window 602 in which the system 101 may display a set of 
matched objects 106 found in response to the query 104. 

In a preferred embodiment, the operator 108 may enter 
the query 104 in the query window 601. The query 104 is input to 
the explorer 105, which processes it as described herein, and 
generates the vector query 112 . The vector query 112 is input to 
the object file system 107, and generates the hit table 113 of 
matched objects 106. The hit table 113 is input to the user 
interface 109, which displays the matched objects 106. The 
operator may select a displayed matched object 106 to view its 
contents . 



20 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 



WO 95/02221 



PCT/USy4/0756y 



In a preferred embodiment, the user interface 109, the 
explorer 105, and the object file system 107, may operate as 
asynchronouBly as possible. Accordingly, the object file system 
107 may search the database 102 for matched objects 106 
independently, once it has sufficient information from the 
explorer 105; the user interface 109 may display matched objects 
106 from the hit table 113 as they are generated by the object 
file system 107. 

In the example, the operator 108 has entered the query 
104 "who invented the light bulb?" in a content field 603 of the 
query window 601, and the system 101 has responded with a set of 
matched objects 106 in the results window 602. The matched 
objects are displayed one per line, in columns labelled "rank", 
"query", "header", and "relevant words". 

In the example, a rank field 604 displays the quality 
of match for each displayed matched object 106. In a preferred 
embodiment, the system 101 may order the matched objects 106 by 
rank. This may occur as the normal procedure, or at the request 
of the operator 108, e.g., by means of a "sort" command 605 in 
the query window 601. In a preferred embodiment, the rank field 
604 may also be color-coded by value. 

In the example, a query field 606 displays the relevant 
words of the query which are most related to the displayed 
matched object 106. 
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In the example, a header field 607 displays the header 
words 204 of the displayed matched object 106. 

In the example, a relevant words field 608 displays th6 
most common relevant words 211 of the displayed matched object 
106. 

In the example, a topics field 609 of the query window 
601 displays suggested topics for refinement of the query 104 
which the system 101 has identified. In a preferred embodiment, 
the operator 108 may select a topic in the topics field 609, and 
the system will display a subtopics window 610 {overlaid on the 
query window 601 and the results window 602) showing the 
subtopics which the system 101 has identified for that topic. 

QUERY REFINEMENT 

The operator 108 may refine the query 104 in response 
to the matched objects 106, and the explorer 105 may attempt to 
match objects 106 using the query 104 as refined. This may occur 
at the request of the operator 108, e.g., by means of a "refresh" 
commeuid 611 in the query window 601. 

In a preferred embodiment, the operator 108 may select 
one or more subtopics 405 to refine the query 104. To do so, the 
operator 108 may identify (e.g., by pointing to with a pointing 
device such as a mouse) one or more subtopics 405 in the 
subtopics window 610. The selected subtopics 308 may be "added" 
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to the query 104 and the explorer 105 may attempt to match 
objects 106 using the query 104 as refined. 

In a preferred embodiment, the operator 108 may also 
select one or more relevant words 211 to refine the query 104. To 
do so, the operator 108 may identify (e.g. by pointing to) the 
relevant words field 608 for a particular matched object 106 and 
"drag" that relevant words field 608 to the content field 603; 
the system 101 will display a relevance feedback window 612 
(overlaid on the query window 601 and the results window 602) 
showing the relevant words 211 for that matched object 106. 

In a preferred embodiment, the operator 108 may select 
one or more relevant words 211 to refine the query 104. To do 
so, the operator 108 may identify (e.g., by pointing to) one or 
more relevant words 211 in the relevance feedback window 612. 
The selected relevant words 211 may be "added" to the query 104 
and the explorer 105 may attempt to match objects 106 using the 
query 104 as refined. 

The query 104 as refined (like the original query 104) 
is presented as a vector query 104 to the CBR engine 110. When 
selected subtopics 308 or relevant words 211 are "added" to the 
query, they are properties which the CBR engine 110 must match to 
objects 106, as described for methods of iterative refinement of 
case-based matching shown in those co-pending applications which 
have been incorporated by reference. (Thus, the CBR engine 110 
must match to objects 106 as if the operator 108 had answered a 
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query refining question in a case-based system.) A query 104 as 
refined may be further refined, allowing the operator to 
iteratively refine the query 104 until desired objects lOS are 
located. 

VIEWING CLUSTERS 

Figure 7 shows a second example explorer user interface 
screen, as viewed by an operator, in which clusters are 
displayed. 

The operator 108 may select a "cluster" command (figure 
6) or "uncluster" (figure 7) command 701 in the query window 601, 
and the system lOl will display a set of clusters 402, each a set 
of related matched objects 106, in place of displaying matched 
objects 106 themselves. In the example, the operator has 
selected the "cluster" command 701 for the same query 104 as in 
the example of figure 6. 

In the example, an expand field 702 displays whether 
the cluster 402 can be expanded (shown by a " + " symbol) to 
display individual matched objects 106, or can be collapsed 
(shown by a "-" symbol) to display a single identifier for the 
cluster 402. 

In the example, the rank field 703 displays the best 
rank for all matched objects 106 in the cluster 402. In a 
preferred embodiment, the system 101 may order the clusters 402 
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by this rank field 703. This may occur as the normal procedure, 
or at the request of the operator 108, e.g., by means of the 
"sort" command 605 in the query window 601. In a preferred 
embodiment, this rank field 703 may also be color-coded by value. 

In the example, the relevant words field 608 displays 
the most common relevant words 211 in the cluster 402. 

Other fields and windows remain similar to the example 
of figure 6. 

The operator 108 may also choose to cluster all objects 
106 in a specific set, e.g., a specific directory in the object 
file system 107. In a preferred embodiment, the operator 108 may 
restrict the scope of the explorer 105 to a specific directory 
and issue the "cluster" command 701; the system 101 will display 
the objects 106 in that directory in clusters 402. 

SETTING PARAMETERS 

Figure 8 shows an example explorer user interface 
screen, as viewed by an operator, in which settings may be set by 
the operator. 

In a preferred embodiment, the operator 108 may select 
settings appropriate for the system 101. The operator 108 may 
select a "properties" command 801 in the query window 601 (figure 
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6) , and the system 101 will display a properties window 802 with 
a set of property values 803 which may be set. 

A "minimum rank of returned hits" property 804 is a 
threshold value for including matched objects 106; matched 
objects 106 whose rank falls below this value are not displayed 
in the results window 602 and are not used in further processing. 
The rank of a matched object 106 is calculated by the CBR engine 
110. In the example, this value is set to 80. 

A "maximum clustered hits" property 805 is a maximum 
number of matched objects 106 which are included in a single 
cluster 402. Those matched objects 106 not included in clusters 
402 are placed in a special cluster 402 labelled "Other" . In the 
example, this value is set to 400. 

A "clustering time" property 806 is the elapsed real 
time devoted to clustering. In the example, this value is set to 
2500 milliseconds. 

A "minimum number of clusters" property 807 is the 
lower bound for the number of clusters 402 generated. In the 
example, this value is set to 2 clusters. 

A "maximum number of clusters" property 808 is the 
upper bound for the number of clusters 402 generated. In the 
example, this value is set to 8 clusters. The system 101 
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attempts to generate a number of clusters 402 between the minimum 
and maximum number selected. 

A "maximum topics" property 809 is the maximum number 
of topics displayed in the topics field 609 in the query window 
601. In the example, this value is set to 7 topics. 

A "maximum subtopics" property 810 is the maximum 
number of subtopics displayed in the subtopics window 610. In 
the example, this value is set to 250 subtopics. 

A "do/don't cluster" property 811 sets whether or not 
clustering is performed. In the example, this value is set to 
YES. 

A "do/don't generate query topics" property 812 sets 
whether or not topics and subtopics are generated in response to 
query terms 310. In the example, this value is set to YES. 

A "do/don't generate salient topics" property 813 sets 
whether or not topics and subtopics are generated in response to 
relevant words 211. In the example, this value is set to YES. 

A "boolean/vector query" property 814 sets whether the 
object file system 107 performs a boolean query or a vector query 
in response to the explorer 105. In the example, this value is 
set to vector queries. A boolean query would have boolean 
connectors (e.g., "AND", "OR") coupling the query terms 310, so 
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that the query 104 would not be as flexibly matched. Search 
using boolean queries is well known in the art. 

APPENDICES 

Appendix A shows a table of parts of speech and a set 
of lexical rules for the English language, which may be used for 
the tag-and-segment-text process or the tag-and-segment-text 
process in a preferred embodiment. 

Appendix B shows an output of a test run of an example 
filter when applied to a portion of an example multimedia 
encyclopedia used as a database, available as "Microsoft Encarta" 
from Microsoft Corporation of Redmond, Washington. 

Alternative Embodiments 

While preferred embodiments are disclosed herein, many 
variations are possible which remain within the concept and scope 
of the invention, and these variations would become clear to one 
of ordinary skill in the art after perusal of the specification, 
drawings and claims herein. 
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The following are the current set of niles used for detcmiining noun phrases: 

1. ' nouni)hrase -> proper-noun (e.g. "Elvis") 

2. noun-phrase -> pronoun (e.g. "he") 

3. noun-phrase -> noun (e.g. "cars") 

4. noun-phrase -> gerund (e.g. "running") 

5. noun-phrase -> delcrminernoun-phrase (e.g. "The person") 

6. noun-phrase -> quantifier noun-phrase (e.g. "Three people") 

7. noun-phrase -> adjective noun-phrase (e.g. "fluffy clouds") 

8. noun-phrase -> advtrb noun-phrase (e.g. "maddeningly fluffy clouds") 

9. noun-phrase -> noun noun-phrase (e.g. "printer ribbons") 

li). noun-phrase -> noun-phrase relative-clause (e.g. "The car that hit me") 

11. noun-phrase -> noun-phrase prepositional-phrase 

(e.g. The person with the most toys") 

12. noun-phrase -> noun-phrase that sentence 

(e.g. The candidate that I will vote for") 

13. noun-phrase -> noun-phrase [, noun-phrase]* [,] and noun-phrase 
(c.g. "Lany, Moe and Curly") 

14. noun-phrase -> noun-phrase (, noun-phrase]* I,] or noun-phrase 
(c.g. "England, France, or Germany") 

15. noun-phrase -> comparative noun-phrase than noun-phrase 
(e.g. "more tea than China") 

The Find Taxonomic Relations process (process 2.2 in figure 4) uses ART-IM rules to capture 
patterns of words which indicate taxonomic relationships between the words. For example, it detects 
patterns like: 

"... kangaroos, wallabies, and other marsupials ..." 

From this particular phrase, one could reasonably extract the relations 

IS_A(kangaroo,marsupial) and 
IS_A(wailaby,marsupial) 

Other patterns which detect this type of relation extracted from [14] are : 

1. NP such as (NP.) ♦ {(and \ or) } NP 

2. 5uchNPas{NP,)*X(and\or)}NP 

3. NP f,NP}*{.} and other NP 

4. NP {,NP}*f,} or other NP 

5. NPQ including {NP,} * {(and \ or) } NP 

6. NP {,} especially {NP.) * {(and \ or) ) NP 
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Clustering file afl.txt 
Non-empty clusters: 5 
Clusters: 5 

I Hits Vals Seed, Value: Count 



0 10 NONE 

12 0 Reuther, Walter Philip, Labor, labor: 2, president: 2, wage- 2 

2 2 0 Railroad Labor Organizations, Brotherhood, Union, united states- 2 

3 7 0 Hillman, Sidney, Labor, labor; 7, afl:7, union: 4, american federati 

4 2 0 Kirkland, Lane, Labor, director: 2 

Passes: 1029, best pass: 830, best score: 0.955, worst score: 0.170 
Cluster 0, has 1 hits: " 

Football, Type, United States 
Cluster 1, has 2 hits: 'labor: 2, president: 2, wage: 2' 

Meany, George, Labor 

Reuther, Walter Philip, Labor 
Cluster 2, has 2 hits: 'united states: 2, union: 2, management: 2 ' 

Railroad Labor Organizations, Brotherhood, Union 

Teamsters Union, Full, International Brotherhood 
Cluster 3, has 7 hits: 'labor: 7, afl:7, union :4, american federation : 4 , cio:3, o 

American Federation, Labor, Congress 

Gomper, Samuel, Labor 

Green, William, Labor 

Hillman, Sidney, Labor 

Knight, Labor, Union 

Lewi, John L, Labor 

Strike, Labor, Relation 
Cluster 4, has 2 hits: 'director: 2' 

Kirkland, Lane, Labor 

Rozelle, Pete, Full 



Clustering file alcohol.txt 
Non-empty clusters: 5 
Clusters: 5 

I Hits Vals Seed, Value: Count 



0 15 0 (OTHER), blood:3, vitamin:2, tissue:2, poison:2, sugar metabolism • 

1 22 0 Antifreeze, Chemical, Substance, alcohol: 21, acid: 7, ethyl: 7, liqu 

2 10 0 Vodka, Beverage, Known, alcohol: 9, percent: 5, beverage: 5, use -3 1 

3 6 0 Gasohol, Blend, Part, fuel:5, alcohol:2, methanol:2, combustion- 2 

4 4 0 Marijuana, Mixture, Leave, drug:3, alcohol:3, syndrome:3, psychoac 
Passes: 334, best pass;- 158, best score: 0.307, worst score: 0.132 

Cluster 0, has 15 hits: '(OTHER), blood:3, vitamin:2, tissue:2, poison-2 sugar 
Birth Defects, Disorder, Structure 
Cancer, Medicine, Growth 
Corn, Maize, Cereal 
Crop Farming, Cultivation, Plant 
First Aid, Emergency, Measure 
Fungi, Group, Organism 
Liver, Organ, Vertebrate 
Nutrition, Human, Science 
Paint, Varnish, Liquid 
Pennsylvania, Full, Commonwealth 
Poison, Substance, Produce 
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Thermometer, Instrument, Measure 

Wine, Beverage, Juice 

Wood, Substance, Trunk 
Cluster 1, has 22 hits: •alcohol:21, acid:7, ethyl:7, liquid:*, example:3, chemi 

Acetaldehyde, Volatile, Liquid 

Antifreeze, Chemical, Substance 

Azeotropic Mixture, Solution, Ratio 

Butyl Alcohol, Chemical, Formula 

Cannizzaro, Stanislao, Italian 

Disease, Medicine, Health 

Ester, Chemistry, Compound 

Ether, Chemistry, Ethyl 

Fermentation, Chemical, Change 

Formaldehyde, Compound, Carbon 

Glycerin, Glycerol, C3h8o3 

Gum, Substance, Plant 

Iodine, Element, Symbol 

Lipid, Group, Substance 

Salicylic Acid, White, Solid 

Solution, Chemistry, Mixture 

Tannin, Acid, Name 

Turpentine, Name, Semifluid 

Vinegar, Condiment, Preservative 

Wax, Name, Ester 

Whiskey, Liquor, Mash 

Zymology, Zymurgy, Biochemistry 
Cluster 2, has 10 hits: 'alcohol: 9, percent: 5, beverage: 5, use: 3, liquor: 3, dist 

Beer, Term, Beverage 

Cider , Sweet , J uice 

Cosmetic, Term, Preparation 

Distillation, Process, Liquid 

Distilled Liquors, Beverage, Alcohol 

Gin, Liquor, Grain 

Liqueur, Beverage, Spirit , 

Police, Agency, Community 

Prohibition, Ban, Manufacture 

Vodka , Beverage , Knoun 
Cluster 3, has 6 hits: 'fuel: 5, alcohol: 2, methanol: 2, combustion : 2 , coal: 2, eng 

Alcohol, Arabic, Al-kuhul 

Automobile, Greek, Auto 

Combustion , Process , Oxidation 

Energy Supply, World, Resource 

Gasohol, Blend, Part 

Rocket, Term, Propulsion 
Cluster 4, has 4 hits: 'drug: 3, alcohol: 3, syndrome: 3, psychoactive drugs: 2, mar 

Alcoholism, Illness, Ingestion 
Drug Dependence, State, Compulsion 
Marijuana, Mixture, Leave . * 
Psychoactive Drugs, Chemical, Substance 
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Clustering file bulb.txt 
Non-empty clusters: 5 
Clusters: 5 

I Hits Vals Seed, Values Count 
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0 9 0 (OTHER), plant: 3, united states: 2, seed: 2, gardening: 2, flower; 2 

1 10 0 Radiometer, Instrument, Intensity, bulb:?, light: 4, tuber: 3, stem: 

2 3 0 Electric Lighting, Illumination, Mean, lamp: 3, glass: 2, neon: 2, ar 

3 5 0 Autumn Crocus, Name, Herb, bulb: 5, liliaceaes4, herb: 3, lily: 3, pi 

4 6 0 Hygrometer, Type, Instrument, temperature : 4 , atmosphere : 3 , point: 3 
Passes: 598, best pass: 333, best score: 0,491, worst score: 0.208 

Cluster 0, has 9 hits: '(OTHER), plant:3, united states:2, seed:2, gardening:2. 

Disease, Plant, Deviation 

Gardening, Cultivation, Plant 

Garlic, Name, Herb 

Genetics, Study, Trait 

Gopher, French, Gauffre 

Horticulture, Latin, Hortu 

Peanut Worm, Name, Small 

Spice, Flavoring, Part 

Technology, Term, Process 
Cluster 1, has 10 hits: 'bulb:?, light: 4, tuber: 3, stem: 3, rhizome: 3, electron: 2 

Bulb, Mass, Leave 

Edison, Township, Middlesex County 
Edison, Thomas Alva, Inventor 
Onion, Name, Herb 

Photoelectric Cell, Phototube, Electron 

Photography, Technique, Permanent 

Radiometer, Instrument, Intensity 

Rhizome, Stem, Organ . 

Tuber, Stem, Plant 

Ray, Radiation, Wavelength 
Cluster 2, has 3 hits: 'lamp:3, glass:2, neon:2, arc:2, bulb:2, argon:2, light:2 

Argon, Element, Symbol 

Electric Lighting, Illumination, Mean 

Neon Lamp, Glass, Bulb 
Cluster 3, has 5 hits: 'bulb:5, liliaceae:4, herb:3, lily:3, pistil:2, height:2. 

Autumn Crocus, Name, Herb 

Hyacinth, Plant, Genu 

Soap Plant, Amole, Native 

Star-of-bethlehem, Name, Herb 

Tuberose, Herb, Polianth 
Cluster 4, has 6 hits: ' temperature: 4 , atmosphere: 3 , point:3, humidity:2, bulb:2 

Blood Pressure, Pressure, Blood 

Humidity, Moisture, Content 

Hygrometer, Type, Instrument 

Meteorology, Study, Atmosphere 

Thermometer, Instrument, Measure 
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Vapor, Physic, Term 



Clustering file columbus.txt 
Non-empty clusters: 7 
Clusters: 7 

I Hits Vals Seed, Value: Count 
0 4 0 (OTHER), century: 2 

14 0 Pinzn, Name, Family, expedition : 3 , voyage: 2, hispanlola:2, pinta-2 

2 5 0 Puerto Rico, Commonwealth, Spanish Estado Libre Asociado, Spanish- 

3 2 0 Samana Cay, Island, Bahama, atlantic ocean:2, landfall: 2, san salv 

4 6 0 Mississippi, East South Central, U.S., state:5, river:3, city3 a 
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5 5 0 Santiago, Dominican Republic, Name, cacao: 3, city: 3, Caribbean: 2, 

6 4 0 South America, Continent, Asia, death valley: 2, south: 2, slavery -2 
Passes: 614, best pass: 65, best score: 0.520^ worst score: 0 189 

auster 0, has 4 hits: '(OTHER), century: 2' 

American Literature, Literature, English 

Coin, Geography, City 

Europe, Continent, World 

Knight, Columbu, Organization 
Cluster 1, has 4 hits: 'expedition: 3, voyage: 2, hispaniola : 2 , pinta:2, ship- 2' 

Columbu, Christopher, Italian Cristoforo Colombo 

Pinzn, Name, Family 

Ship, Type, Construction 

Velzquez, Diego, Soldier 

^ 'Spanish: 4, island: 3, spain:2, de:2, Christopher columbus 

Bobadxlla, Francisco, De 

Cuba, Island, West Indies 

Dsirade, Island, West Indies 

Ferdinand V, The Catholic, King 

Puerto Rico, Commonwealth, Spanish Estado Libre Asociado 
Cluster 3, has 2 hits: 'atlantic ocean:2, landfall:2, san salvador:2, island-2 

Samana Cay, Island, Bahama ' ' 

San Salvador, Island, Watling Island 
Cluster 4, has 6 hits: 'state:5, river:3, city:3, american civil war:2, ohio-2, 

Columbu, Georgia, City 

Columbu, Mississippi,_City 

Columbu, Ohio, City 

Georgia, state, South Atlantic 

Mississippi, East South Central, U.S. 

Ohio, East North Central, U.S. 
Cluster 5, has 5 hits: 'cacao:3, city:3, caribbean:2, dorainican:2, santiaqo:2, c 

Columbu, Indiana, City 

Santiago, Dominican Republic, Name 

Santo Domingo, Trujillo, City 

Spanish Town, City, Jamaica 

Tobago, Republic, Commonwealth 
Cluster 6, has 4 hits: 'death valley:2, south:2, slavery:2, brazil:2, continent: 

Black, America, Immigration 
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North America, Contiiiv .c Canada 
South America, Continent, Asia 
United States, America, Republic 



Clustering file dualism.txt 
Non-empty clusters: 5 
Clusters: 5 

i Hits Vals Seed, Value: Count 



0 2 0 NONE 

15 0 Dualism, Philosophy, Theory, mind:5, philosopher: 5, philosophyi3, 

2 3 0 Devil, Hebrew, Belief, evil: 3, god: 3, good: 2, human: 2, middle ages 

3 3 0 Paulician, Church, History, dualism: 3, sect: 3, bogomils:2, old tes 

4 2 0 Docetism, Christian, Heresy, doctrine:2, human: 2 
Passes: 1050, best pass: 312, best score: 1.003, worst score: 0.397 
Cluster 0, has 2 hits: 

Austria, German, sterreich 
Zoroastrianism, Religion, Persia 
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Cluster 1, has 5 hits: 'mind:5, philosopher: 5, philosophy : 3 , matter:3, universe: 

Dualism, Philosophy, Theory 

Metaphysics, Branch, Philosophy 

Monism, Greek, Mono 

Occasionalism, Term, System 

Philosophy, Greek, Philosophia 
Cluster 2, has 3 hits: 'evil: 3, god: 3, good: 2, human: 2, middle ages: 2, middle ea 

Albigens, Follower, Single 

Devil, Hebrew, Belief 

Evil, Wrong, Harm 

Cluster 3, has 3 hits: 'dualism: 3, sect: 3, bogomils:2, old testament: 2, century: 

Basilide, Teacher, Alexandria 

Bogomils, Member, Sect 

Paulician, Church, History 
Cluster 4, has 2 hits: 'doctrine: 2, human: 2' 

Docetism, Christian, Heresy 

Neoplatonism, Designation, Doctrine 



Clustering file infant.txt 
Non-empty clusters: 7 
Clusters: 7 

I Hits Vals Seed, Value: Count 



0 4 0 NONE 

13 0 Gesell, Arnold Lucius, Psychologist, infant: 3, development: 2 

2 2 0 Incubator, Apparatu, Chamber, growth: 2 

3 2 0 Pregnancy, Childbirth, Term, birth: 2, pregnancy: 2, infant: 2, child 

4 2 0 Hondura, Republic, Central America, country: 2, i980s:2 

5 3 0 Baptism, Greek, Baptein, rite: 2, baptism: 2 

6 2 0 Japan, Japanese Dai, Great, manchuria:2, government: 2, party: 2 



40 



Passes: 835, best pass: best score: 0.795, worst ,"^CJ^7f4 

Cluster 0, has 4 hits: 

Free Trade, Interchange, Frontier 

Human, Name, Individual 

Perception, Process, Stimulation 

Scotland, Division, Kingdom 
Cluster 1, has 3 hits: 'infant: 3, development : 2 ' 

Gesell, Arnold Lucius, Psychologist 

Infancy, Period, Birth 

Sudden Infant Death Syndrome, Sid, Death 
Cluster 2, has 2 hits: 'growth: 2' 

Incubator, Apparatu, Chamber 

Population, Term, Human 
Cluster 3, has 2 hits: 'birth: 2, pregnancy: 2, infant: 2, childbirth: 2, women: 2' 

Obstetrics, Branch, Medicine 

Pregnancy, Childbirth, Term 
Cluster 4, has 2 hits: 'country: 2, 1980s: 2' 

Hondura, Republic, Central America 

Sierra Leone, Nation, Africa 
Cluster 5, has 3 hits: 'rite:2, baptism:2' 

Baptism, Greek, Baptein 

Circumcision, Removal, Part 

Mennonite, Religious, Group 
Cluster 6, has 2 hits: 'manchuria: 2, government: 2, party: 2' 

China, Chinese Zhonghua Renmin Gongheguo, People Republic 
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Japan, Japanese Dai, Great 



Clustering file israel.txt 
Non-empty clusters: 4 
Clusters: 4 

If Hits Vals Seed, Value: Count 



0 22 0 (OTHER), government: 6 , war:4, century:3, french revolution : 3 , coun 

1 66 0 Judah, Old Testament, Name, israel:64, judah:20, old testament: 20, 

2 39 0 Nasser, Gamal Abdel, Egyptian, israel:32, arab:26, Israeli: 20, pal 

3 11 0 Song, Solomon, Book, book: 10, old testament: 9, Israel: 9, chap: 5, b 
Passes: 127, best pass:_117, best score: 0.213, worst score: 0.083 

Cluster 0, has 22 hits: '(OTHER), government: 6 , war:4, century:3, french revolut 
Achille Lauro, Italian, Cruise 
Anti-semi tism. Social, Agitation 
Asia, Continent, Island 
Assyria, Ashur, Ashshur 
Bahai, Persian, Glory 
Buber, Martin, Religious 
Cabala, Hebrew, Tradition 
Crusade, Expedition, Undertaken 
Eschatology, Discourse, Last 
Espionage, Collection, Information 
Iran, Islamic Republic, Republic 
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Jewish Art, Architect -t Jew 

Jewish Music, Religic ^, Music 

Nationalism, History, Movement 

Portuguese Literature, Literature, Portuguese 

Refugee, Person, Country 

Romania, Republic, Europe 

Saudi Arabia, Monarchy, Southwest Asia 

Union, Soviet Socialist Republics, Russian Soyuz Sovyetskikh Sotsialisticheski 
United Nations, Organization, Nation-state 
United States, America, Republic 
Woman Suffrage, Right, Women 
Cluster 1, has 66 hits: 'Israel: 64, judah:20, old testament: 20, king: 18, bc:12, 
Abner, Old Testament, Cousin 
Ahab, King, Israel 
Amaziah, Hebrew, King 
Ammonite, People, Region 
Amo, Book, Old Testament 
Angel, Greek, Aggelo 
Apostle, Greek, Apostolo 
Ashqelon, Town, Palestine 
Balaam, Old Testament, Prophet 
Kokhba, Simon, Name 
Bene Israel, Community, Jew 
Ben-zvi, Itzhak, Second 
Bethlehem, Jordan, Hebrew 
Bible, Holy Bible, Book 
Cantiel, Mount, Mountan 
Diaspora, Greek, Dispersion 
David , King , Be 
Edom, Old Testament, Times 
Elat, Eilat, City 
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Elia, Century, Be 

Elisha, Old Testament, See 

Ephraim, Hebrew, Old Testament 

Esdraelon, Plain, Jezreel 

Ezekiel, Book, Old Testament 

Falasha, Sect, Ethiopia 

Galilee, Galil, Circle 

Gideon, Hebrew, Hewer 

Habima Theater, Former, Name 

Hebron, City, Israeli-occupied Jordan 

Herzog, Chaim, President 

High Priest, Hierarchy, Head 

Holon, City, Israel 

Israel, Kingdom, Hebrew 

Jacob, Old Testament, Patriarch 

Joash, Name, King 

Jehoshaphat, Hebrew, Jehovah 

Jehu, Hebrew, Jehovah 

Jeremiah, Book, Old Testament 
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Jeroboam I, Old Testai sr. See 

Jeroboam li, King, Israel 

Jew, Usage, Hebrews 

Jezebel, Tyrian, Princess 

Jonathan, Old Testament Books, Samuel 

Judah, Old Testament, Name 

Judaism, Culture, Jew 

Justification, Theology, Way 

King, Book, Old Testament 

Lost Tribes, History, Tribe 

Manasseh, Son, Old Testament 

Meir, Golda, Israeli 

Michael, Hebrew, God 

Moab, Country, Hill 

National Jewish Welfare Board, National, Agency 
Negeb, Region, Middle East 
Philistine, Inhabitant, Region 
Putnam, Israel, Soldier 
Ramat Can, City, Central 
Rehoboam, King, Judah 
Samuel, Book, Old Testament 
Saul, King, Israel 
Sharon, Plain, Israel 
Shema, Hebrew, Word 
Solomon, King, Israel 
Tiberia, Lake, Sea 
Weizmann, Chaim, Long-time 
Zangwill, Israel, English 
Cluster 2, l\as 39 hits: 'israel:32, arab:26, israeli:20, Palestine: 11, egypt:ll, 
Husein, King, Jordan 
Acre, Akko, Seaport 
Agnon, Slimuel Yosef, Israeli 
Annnan, Rabbah Ammon, Philadelphia 
Arab League, Name, League 
Arafat, Yasir, Palestinian 
Aren, Moshe, Israeli 
Menachem, Israeli, Prime 
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Ben-gurion, David, Israeli 
Damascu, Arabic Dimashq, Ash-sham 
Day an, Moshe, Israeli 

Egypt, Arab Republic, United Arab Republic 

Gaza, Arabic Ghazze, City 

Golan Heights, Region, Syria 

Haifa, City, Seaport 

Hebrew Literature, Literature, Jew 

Iraq, Irak, Republic 

Israel, Republic, Middle East 

Jerusalem, Arabic, Al-qud 

Jordan, River, Middle East 

Jordan, Hashemi te Kingdom, Arabic 
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Kibbutz, Village, Far 

Lebanon, Arabic Lubnaii, Republic 

Libya, Full, Socialist People Libyan Arab Jamahiriyah 
Middle East, Region, Geography 
Nasser, Gamal Abdel, Egyptian 
Palestine, Region, Extent 
Palestine Liberation Organization, Plo, Body 
Sadat, Egyptian, Military 
Six-day War, Conflict, June 
Suez Canal, Waterway, Running 
Syria, Arabic Suriyah, Al-arabiyah 
Tel Aviv-jaffa, Tel Aviv-yafo, City 
Terrorism, International, Use 
Tunisia, Republic, Africa 
West Bank, Area, West 
Yom Kippur War, Conflict, Israel 
Zionism, Movement, People 
Zionist Organization, America, Zoa 
Cluster 3, has 11 hits: 'book: 10, old testament: 9, israel:9, chap: 5, be: 5, proph 
Dead Sea Scrolls, Collection, Hebrew 
Hosea, Book, Old Testament 
Isaiah, Book, Old Testament 
Joshua, Book, Old Testament 
Judge, Book, Old Testament 
Hicah, Book, Old Testament 
Number, Book, Old Testament 
Obadiah, Book, Old Testament 
Song , Solomon , Book 
Wisdom, Solomon, Book 
Zechariah, Book, Old Testament 



Clustering file marx.txt 
Non-empty clusters: 6 
Clusters: 6 

r Hits Vals Seed, Value:Count 



0 2 0 (OTHER), german:2, germany:2, east:2, baltic sea:2 

13 0 Hegel, G, W, philosopher : 3 , philosophy: 2 

2 4 0 Bolshevism, Doctrine, Theory, communist: 4, lenin:4, revolution : 3 , 

3 4 0 Marx Brothers, 20th-century, Comedian, marx:4, socialism: 2, engels 

4 4 0 Communist Manifesto, German Manifest, Partei, capitalist: 3, class: 

5 6 0 Ideology, System, Concept, social:3, marx:3, labor:2, world war ii 
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Passes: 722, best pass: 675, best score: 0.663, worst score: 0.248 
Cluster 0, has 2 hits: '(OTHER), german:2, germany:2, east: 2, baltic sea: 2 

Germany, Country, Europe 

Germany, German Democratic Republic, Gdr 
Cluster 1, has 3 hits: ' philosopher: 3 , philosophy : 2 ' 

Hegel, G, W 

Philosophy, Greek, Philosophia 
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Political Theory, Sul .\ ion, Science . 
Cluster 2, has 4 hits: ' comnunist : 4 , lenin:4, revolution ; 3 , communism: 2, governm 

Bolshevism, Doctrine, Theory 

Communism, Concept, System 

International, Name, Socialist 

Socialism, Doctrine, Movement 
Cluster 3, has 4 hits: 'marx:4, socialism: 2, engels:2' 

Bernstein, Edu'ard, Gentian Social Democratic 

Economics, Science, Production 

Engels, Friedrich, German 

Marx Brothers, 20th-century, Comedian 
Cluster 4, has 4 hits: 'capitalist: 3, class: 3, capitalism: 2, communist: 2, bourge 

Bourgeoisie, Resident, European 

Capitalism, System, Individual 

Communist Manifesto, German Manifest, Partei 

Marx, Karl, German 

Cluster 5, has 6 hits: 'social: 3, marx:3, labor: 2, world war il:2, german!2, cen 
Ideology, System, Concept 
Karl-marx-stadt, Former, Name 
Kautsky, Karl Johann, German Marxist 
Lassalle, Ferdinand, German 
Sociology, Science, Deal 
Wage, Theory, Labor 



Clustering file muslim.txt 
Non-empty clusters: 4 
Clusters: 4 

» Hits Vals Seed, Value: Count 



0 41 0 (OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam:4 

1 20 0 Philippine, Republic, Pacific Ocean, 1980s; 17, country: 8, governme 

2 40 0 Kashgar, Kashi, Kaxgar, muslim:38, India: B, muhammad:?, jerusalem: 

3 11 0 Mathematics, Study, Relationship, century: 11, art: 3, f ranee: 3, arc 
Passes: 146, best pass: 47, best score: 0.210, worst score: 0.124 

Cluster 0, has 41 hits: '(OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam- 
Alfonso Viii, King, Castile 
Arabia, Desert, Peninsula 
Arabic Literature, Literature, People 
Archaeology, Greek, Archaic 
Averros, Arabic, Abu 
Black Muslims, Religious, Organization 
Borneo, Island, World 
Chess, Game, Skill 
Christianity, World, Religion 
Chronology, Science, Division 
Concubinage, Term, World 
Costume, Clothing, Pepple 
Demon, Usage, Spirit 
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Gandhi, Mohandas Karar .. 1, Mahatma Gandhi 

Ghana, Kingdom, West .-i^-an 

Hegira, Hejira, Arabic 

Iraq, Irak, Republic 

Jacobite Church, Christian, Group 

Java, Island, Malay Archipelago 

Jew, Usage, Hebrews 

Jordan, Hashemite Kingdom, Arabic 

Judaism, Culture, Jew 

Karbala, City, Iraq 

Mahdi, Arabic, Mahdiy 

Medina, Medinat-en-nabi , City 

Middle East, Region, Geography 

Nehru, Indian, Nationalist 

Orthodox Church, Major, Branch 

Philosophy, Greek, Philosophia 

Pottery, Clay, Firing 

Punjab, Region, River 

Saudi Arabia, Monarchy, Southwest Asia 

Shiite, Arabic, Partisan 

Sikhs, Follower, Religion 

Sudan, Republic, Africa 

Trigonometry, Branch, Mathematics 

Tobago, Republic, Commonwealth 

Tunisia, Republic, Africa 

Turkey, Republic, Turkish Trkiye Cumhuriyeti 
Vijayanagar, Kingdom, India 
Cluster 1, has 20 hits: '1980s: 17, country: 8, government:?, Spanish: 5, arab:4 
Afghanistan, Persian Afghntstn, Republic 
Bangladesh, Full, People Republic 
Berber, Name, Language 
Cameroon, Republic, Africa 
Chad, Republic, Central 
Ethiopia, Abyssinia, Republic 
Gambia, Republic, Commonwealth 
Gibraltar, Dependency, Promontory 
Indonesia, Republic, Island 
Iran, Islamic Republic, Republic 
Israel, Republic, Middle East 
Kenya, Republic, Africa 

Libya, Full, Socialist People Libyan Arab Jamahiriyah 
Morocco, Arabic, Al-mamlakah 
Nigeria, Federal Republic, Republic 
Pakistan, Islamic Republic, Republic 
Philippine, Republic, Pacific Ocean 
Republic, Europe, Portion 
Spain, Spanish Espaa, Monarchy 
Syria, Arabic Suriyah^ Al-arabiyah 
Cluster 2, has 40 hits: "muslim:38, India: B, muhammad:7, jerusalem:5, delhi:4 
Fakhruddin All, Fifth, President 
Algeria, French Algrie, Popular Republic 
Allah, Name, Supreme Being 
Almeida, Francisco, De 
Almoravid, Berber, Dynasty 
Asia, Continent, Island 
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Babism, Religion, Offshoot 
Balewa, Sir Abubakar Tafawa, Minister 
Region, Part, Subcontinent 
Caliphate, Office, Realm 
Crusade, Expedition, Undertaken 
Delhi, Old Delhi, City 
Delhi Sultanate, Muslim, State 
Dervish, Turkish, Darvsh 
Fakir, Arabic, Faqir 
Farabl, Tarkhan, Al-farabi 
Gansu, Kansu, Province 
Ghazali, Name, Abu Hamid Muhammad 
India, Republic, Hindi Bharat 
Sir Muhammad, Pakistani, Philosopher 
Islam, World, Religion 
Islamic Music, Vocal, Art 
Jammu, Kashmir, Known 
Jerusalem, Arabic, Al-qud 
Jinnah, Muhammad Ali, Leader 
Kashgar, Kashi, Kaxgar 
Kharijite, Arabic, Kharawrij 
Lebanon, Arabic Lubnan, Republic 
Malaysia, Monarchy, Commonwealth 
Malcolm X, Leader, Omaha 
Mufti, Title, Lawyer 
Palestine, Region, Extent 
Pilgrim, Place, Intent 
Relic, Usage, Body 
Roger I, Norman, Conqueror 
Saladin, Leader, Jerusalem 
Shivaji Bhonsle, Founder, India Maratha State 
Tughluq, Muhammad, Sultan 
Tuni, Tune, City 
Umar, Al-hajj, West African 
Cluster 3, has 11 hits: 'century:ll, art:3, france:3, architecture: 2, sculpture: 
Africa, Continent, Island 
Europe, Continent, World 

France, French Rpublique Franaise, Republic 

Gypsy, People, Heritage 

History, Historiography, Sense 

Indian Art, Architecture, Art 

Indian Literature, Literature, Language 

Islamic Art, Architecture, Art 

Library, Repository, Form 

Mathematics, Study, Relationship 

Portraiture, Representation, Art 



Clustering file pope.txt 
Non-empty clusters: 3 
Clusters: 3 

i Hits Vals Seed, Value: Count 



0 50 0 (OTHER), church:12, henryrB, king:7, english:6, roman:6, governmen 

1 138 0 Benedict Xiv, Pope, Moderation, pope:138, church:2B, rome:26, coun 
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2 12 0 Angelico. r Italian, florence:10, meoxc j, f loi-ehtIneV4 ddmih 
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Passes: 86, best pass: 34, best score: 0.149, worst score: 0.0B2 

Cluster 0, has 50 hits: '(OTHER), church: 12, henry: 8, king:?, english:6, roman:6 

Aquina, Saint ThcMnas, Angelic Doctor 

Borgia, Cesare, Italian 

Bruno, Saint, Carthusian 

Bulgaria, Full, People Republic 

Canon Law, Greek, Kanon 

Carpini, Giovanni, De 

Carroll, John, American Roman Catholic 

Christianity, world. Religion 

Church, England, Anglican Church 

Civil War, Conflict, United States 

Conrad 111, King, Germany 

Corsica, French Corse, Island 

Counter Reformation, Movement, Roman Catholic 

Couplet, Poetry, Term 

Cranmer, Thoma, Archbishop 

Cyril, Methodiu, Saint 

Demarcation, Line, Boundary 

Duns Scotus, John, Theologian 

Easter, Festival, Resurrection 

England, Latin Anglia, Portion 

English Literature, Literature, England 

Erigena, John Scotus, Scholar 

Este, Italian, Family 

Europe, Continent, World 

Felix V, Last, Antipope 

Ferdinand I, Naple, King 

Feuillant, French, Organizations-one 

Finland, Finnish Suomi, Republic 

Fisher, Saint John, English Christian 

France, French Rpublique Franaise, Republic 

Gardiner, Stephen, English 

Germany, Country, Europe ^ 

Henry Viii, King, England 

Henry Iv, France, Bourbon 

Holy Roman Empire, Eatity, Europe 

Hungary, Hungarian Magyarorszg, Republic 

Ireland, Geography, island 

Italian Italia, Republic, Europe 

Knight, Saint John, Jerusalem 

Lincoln, Abraham, President 

Loyola, Saint Ignatius, Spanish Inigo 

Lutheranism, Protestant, Denomination 

Mary, Virgin Mary, Mother 

Mendelssohn, Mos, German 

Middle Ages, Period, European 

Modernism, Theology, Philosophy 

Neri, Saint Philip, Italian 



48 



wo 95/02221 



Orthodox Church, Majo. . inch 
Poland, Republic, Polska Kzeczpospolita 
Pole, Reginald, English Roman Catholic 
Cluster 1, has 138 hits: 'pope:138, church:28, romie:26, councii:23,, papacy:23, h 
Adrian I, Pope, Power 
Adrian Iv, Pope, Englishman 
Adrian Vi, f*ope, Dutchman 



Mar 16 17:39 1993 test. log Emacs buffer Page 13 



Alexander lii. Pope, Authority 

Alexander Vi, Pope, Worldliness 

Algardi, Alessandro, Italian 

Antonelli, Giacomo, Italian 

Arnold, Brescia, 1100-c 

Augustinian, Order, Roman Catholic 

Bacon, Roger, English Scholastic 

Basel, Council, Middle Ages 

Bembo, Pietro, Italian 

Benedict Viii, Pope, Reformer 

Benedict Ix, Pope, 1032-44 

Benedict Xiii, Antipope, Avignon 

Benedict Xiv, Pope, Moderation 

Benedict Xv, Pope, Church 

Bernard, Clairvaux, Saint 

Bonaventure, Saint, Theologian 

Boniface, Saint, English Benedictine 

Boniface Viii, Pope, Power 

Boniface Ix, Pope, Papal States 

Bossuet, Jacques Bnigne, French Roman Catholic 

Bull, Letter, Document 

Bull Run, Battle, Manassa 

Callistu, Calixtus I, Saint 

Callistus 11, Calixtus li. Pope 

Callistus lii, Calixtus lii. Pope 

Canonization, Roman Catholic, Church 

Canossa, Village, Reggio 

Cardinal, Title, Latin 

Catherine, Aragn, Queen 

Catherine, Siena, Saint 

Cedar Mountain, Battle, Military 

Celestine V, Saint, Pope 

Celestine lii. Pope, Born Giacinto Bobo 

Censorship, Supervision, Control 

Chalcedon, Council, Emperor 

Charlemagne, Latin Carolus Magnus, Charle 

Charles V, Holy Roman Empire, Holy Roman 

Church, State, Relationship 

Clement V, Pope, Avignon 

Clement Vi, Pope, Church 

Clement Vii, Pope, Pontificate 

Clement Vii, Antipope, Great Schism 

Clement Viii, Last, Pope 
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Clement Xiv, Pope, Jesui 

Conciliar Theory, Doctrine, Superiority 

Conclave, Latin, Cum 

Constance, Council, City 

Coptic Church, Christian, Church 

Council, Assembly, Doctrine 

Crusade, Expedition, Undertaken 

Damasus I, Saint, Pope 

Damian, Saint Peter, Doctor 

Doctor, Church, Christian 

Dllinger, Johann Joseph Ignaz, Von 

Ecumenical Movement, Movement, Cooperation 

Edmund, Abingdon, Saint 
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Elector, German Imperial, German Kurfrsten 
Eugene lii, Pope, Cistercian 
Eugene Iv, Pope, Dispute 
Formosu , Pope , Trial 
Franciscan, Order, Friars Minor 

Frederick I, Holy Roman Empire, Frederick Barbarossa 

Frederick li. Holy Roman Empire, Holy Roman 

Gallicanism, History, Combination 

Gregory I, Saint, Pope 

Gregory li. Saint, Pope 

Gregory Vii, Saint, Pope 

Gregory Ix, Pope, Inquisition 

Gregory Xi, Pope, Return 

Guiscard, Robert, Norman 

Henry li. Holy Roman Empire, Henry The Saint 

Henry Iv, Holy Roman Empire, Holy Roman 

Henry V, Holy Roman Empire, German 

Hippolytu, Rome, Saint 

Honorius I, Pope, Heretic 

Infallibility, Theology, Doctrine 

Innocent lii. Pope, Pop 

Innocent Iv, Pope, Dominion 

Innocent Xi, Pope, King Louis Xiv 

Inquisition, Institution, Papacy 

Interdict, Roman Catholic, Church 

Investiture Controversy, Dispute, Church 

Jesuit, Society, Jesu 

Joan, Pope, Female 

John li, Pope, Born Mercurius 

John Viii, Pope, Ablest 

John Xii, Pope, Boy Pope 

John Xxi, Pope, Pontiff 

John Xxii, Pope, Second 

John Xxiii, Antipope, Born Baldassare Cossa 

John Xxiii, Pope, Era 

John, John Lackland, King 

John Paul I, Pope, Born Albino Luciani 
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John Paul li. Pope, N .lian 

Jubilee, Jew, Sabbatical 

Julius li, Pope, Reign 

Kulturkampf , German, Culture 

Langton, Stephen, English 

Lateran Councils, Council, Ronan Catholic 

Lateran Treaty, Designation, Agreement 

Leo lii. Saint, Pope 

Leo IX, Saint, Pope 

Leo X, Pope, Renaissance 

Leo Xiii, Pope, Modern 

Louis IV, German, Ludwig Iv 

Lyon, Council, Church 

Martin I, Saint, Pope 

Martin Iv, Pope, Born Simon 

Martin V, Pope, Election 

Molino, De, Spanish Roman Catholic 

Nicholas lii, Pope, Papal States 

Nichola, Cusa, German 
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Occam, William, 1285-1349 
Otto lii. Holy Roman, Emperor 
Otto IV, otto, Brunswick 
Papacy, Office, Pope 

Papal States, Church, Pontifical States 

Paschal li,. Pope, Reign 

Paul' V, Pope, Born Camillo Borghese 

Paul Vi, Pope, Second Vatican Council 

Pepin, Short, Mayor 

Peter Pence, Offering, Pope 

Philip Iv, France, The Fair 

Photiu, 820-91, Patriarch 

Pico Delia Mirandola, Giovanni, Conte 

Pius li. Pope, Writer 

Pius Iv, Pope, Conclusion 

Pius V, Saint, Pope 

Pius Vi, Pope, Reign 

Pius Vii, Pope, Napoleon 

Pius Ix, Pope, Pontificate 

Pius X, Saint, Pope 

Pius Xi, Pope, Path 

Pius Xii, Pope, World War li 

Pope, Latin, Papa 

Cluster 2, has 12 hits: 'florencerlO, medici:5, florentine: 4 , dominican:3, churc 
Alberti, Leon Battista, Italian 
Albertus Magnus, Saint, Albert 
Angelico, Fra, Italian 
Cellini, Benvenuto, Florentine 
Dante Alighieri, Italian, Poet 
Dominican, Friars Preachers, Member 
Ferrara-florence, Council, Basel-ferrara-f lorence 
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Florence, Italian Firt .t Florentia 
Guicciardini , Francesco, Italian 
Leonardo, Da, Vinci 
Medici, Lorenzo, De 
Michelangelo, Creator, History 



Clustering file sound.txt 
Non-empty clusters: 5 
Clusters: 5 

I Hits Vals Seed, Value: Count 



0 6B 0 (OTHER), music: 10, american civil war:6, state: 6, bass: 5, century: 

1 57 0 Mach Number, Aerodynamics, Mechanic, sound: 51, instrument : 8 , pitch 

2 B 0 Letter, Vowel, English, sound: €, long: 3, letter: 3, sign: 2, atlanti 

3 19 0 Linguistics, Study, Language, language:14, english:9, speech:6, so 

4 11 0 Vowel, English, Alphabet, sound: 11, alphabet; 9, letter: 9, hierogly 
Passes: 103, best pass: 74, best score: 0.173, worst score: 0.072 

Cluster 0, has 68 hits: '(OTHER), music:10, american civil war:6, state:6, bass: 
Amati, Family, Italian 

American Indian Languages, Language, People 
American Indians, People, America 
Audiovisual Education, Planning, Preparation 
Band, Ensemble, Brass 
Transaction, Service, Consumer 
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Bird, Name, Member 

Bremerton, City, Kitsap County 

British Columbia, Province, Canada 

Bronx, Borough, New York City 

Building Construction, Procedure, Erection 

Circulatory System, Anatomy, Physiology 

Communication, Method, Receiving 

Connecticut, New England, United States 

Copyright, Body, Right 

Currency, Economics, Term 

Deep-sea Exploration, Investigation, Chemical 

Bass, Member, Violin 

Drama, Dramatic Arts,- Form 

Edison, Thomas Alva, Inventor 

Encyclopedia, Encyclopaedia, Greek 

Firework, Device, Material 

Floor, Floor Coverings, Ceiling 

Folk Dance, Dance, Memljer 

Folk Music, Music, Performance 

Frequency, Term, Science 

Golden Globe Awards, Motion, Picture 

Harmony, Music, Combination 

Harpsichord, Italian, Cembalo 

Insect, Name, Animal 

Jazz, Type, Music 
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Jet Propulsion, Thrus.., parting 

Mississippi, East South Central, U.S. 

Motion Picture Arts, Science, Academy 

Music, Vocal, Part 

Music, Western, Europe 

Musical Form, Arrangement, Element 

Mystic, village, Stonington 

Navigation, Science, Position 

Haven, City, New Haven County 

North Carolina, South Atlantic, U.S. 

Ocean, Oceanography, Body 

Orchestra, Enseinble, Instrument 

Orchestration, Art, Musical 

Philosophy, Greek, Philosophia 

Pianoforte, Keyboard, Musical 

Social Dance, Term, Dance 

Radio, System, Communication 

Rhode Island, Full, State 

Scale, Music, Italian 

Scott, Robert Falcon, Officer 

Seattle, City, Seat 

Seward Peninsula, Peninsula, Alaska 

Snake, Reptile, Name 

Sonata, Italian, Sonare 

Taccnia, City, Seat 

Telephone, Communication, Instrument 

Television, TV, Transmission 

Theater Production, Mean, Form 

United States, America, Republic 

Valdez, City, Alaska 

Video Recording, Process, Recording 
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Viol, Instrument, Century 
Washington, State, U.S. 
Wave Motion, Physic, Mechanism 
Whale, Mammal, Order 
Yachting, Operation, Boat 
Zither, Instrument, String 
Cluster 1, has 57 hits:- 'sound: 51, instrument: 8, pitch: 7, string: 5, recording: 5, 
Acoustics, Greek, Akouein 
Aerodynamics, Branch, Mechanic 
Airplane, Craft, Action 
Albemarle Sound, Inlet, Atlantic Ocean 
Bell, Instrument, Percussion 
Chaplin, Charlie, Name 
Clair, Ren, Name 
Digital Audio Tape, Dat, Tape 
De Forest, Lee, Inventor 
Doppler Effect, Physic, Variation 
Ear, Organ, Hearing 
Edmond, City, Snohomish County 
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Electronic Music, Mus^c. .nowledge 

Exxon Valdez, Oil, Tanker 

Falkland Islands, Islas Malvinas, Island 

Fluid Mechanics, Science, Action 

Grunt, Name, Fish 

Guitar, Instrument, Lute 

Harmonic, vibration. Primary 

Harp, Instrument, Run 

Hearing, Main, Sense 

Hearing Aid, Device, Sound 

Mach Number, Aerodynamics, Mechanic 

Microphone, Device, Energy 

Midi, Acronym, Musical Instrument Digital Interface 

Motion Picture, Sequence, Photograph 

Motion Pictures, History, Development 

Music, Movement, Sound 

Musical Instruments, Tool, Scope 

Noise, Physic, Signal 

Oboe, Hind, Instrument 

Organ, Instrument, Air 

Petroleum, Oil, Biturainou 

Phonograph, Known, Player 

Physic, Science, Constituent 

Prince William Sound, Inlet, Gulf 

Propeller, Device, Force 

Puget Sound, Arm, Pacific Ocean 

Radiometer, Instrument, Intensity 

Reflection, Physic, Phenomenon 

Singing, Use, Voice 

Sonar, Acronym, Sound Navigation And Ranging 

Sound, Phenomenon, Sense 

Determination, Depth, Body 

Sound Recording, Reproduction, Conversion 

Supersonics, Branch, Physic 

Synthesizer, Computer, Peripheral 

Tone, Music, Sound 

Transformer, Device, Coil 
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Tyndall, John, Physioist 
Ultrasonics, Branch, Physic 
Ventriloquism, Art, Sound 
Violin, Instrument, Member 
Viscount Melville Sound, Arm, Arctic Ocean 
Voiceprint Identification, Method, Person 
Warner Brothers, Motion, Picture 
Xylophone, Greek, Xylon 
Cluster 2, has 8 hits: 'sound: 6, long:3, letter: 3, sign: 2, atlantic ocean: 2, mi: 
Animal Behavior The, Behavior, Animal 
C, English, Romance-language 
Diacritic Mark, Sign, Mark 
Island Sound, Body, Salt 
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Letter, Vowel, Englii 

Pamlico Sound, Inlet, Atxdntic Ocean 

Rhyme, Likeness, Sound 

W, Letter, English 

Cluster 3, has 19 hits: ' language: 14 , english:9, speech: 6, sound: 6;' word: 5, spok 

American English, English, Spoken 

Celtic Languages, Indo-european , Family 

Chinese Language, Language, Chinese 

Cuneiform, Latin, Cuneu 

Deafness, Inability, Definition 

English Language, Medium, Conmiunication 

English Literature, Literature, England 

Etymology, Branch, Linguistics 

Grammar, Branch, Linguistics 

Greek Language, Language, People 

Hieroglyph, Character, System 

Japanese Language, Language, Spoken 

Language, Communication, Being 

Linguistics, Study, Language 

Phonetics, Branch, Linguistics 

Poetry, Form, Expression 

Semantics, Greek, Seraantiko 

Versification, Art, Verse 

Writing, Method, Intercommunication 
Cluster 4, has 11 hits: 'sound: 11, alphabet: 9, letter: 9, hieroglyph : 8 , english:7 

Vowel, English, Alphabet 

Alphabet, Alpha, Beta 

F, Letter, Consonant 

K, Letter, English 

L, Letter, English 

M, Letter, English 

0, Letter, English 

R, Letter, English 

U, 21st, Letter 

X, Letter, English 

Y, Letter, English 



Clustering file strike.txt 
Non-empty clusters: 4 
Clusters: 4 

« Hits Vals Seed, Value: Count 
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0 6 0 (OTHER), electron:2, beam:2, tube:2, television:2 

1 11 0 Gary, City, Lake County, strike; 10, united states: 3, president: 2, 

2 10 0 National Labor Relations Act, Nlra, Law, labor: 9, strikers, union: 

3 is 0 Poland, Republic, Polska Rzeczpospolita, government: 11, 1980s: 8, w 
Passes: 453, best pass: 208, best score: 0.445, worst score: 0.154 

Cluster 0, has 6 hits: '(OTHER), electron:2, beam:2, tube:2, television: 2 ' 
Baseball, Game, Skill 
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Cathode-ray Tube, El'- n: , Tube 
Napoleon I, Emperor, rench 
Russia, History, Empire 
Television, Tv, Transmission 
Warfare,, Use, Force 

Cluster 1, has 11 hits: 'strike: 10, united states: 3, president: 2, injunction: 2, 

Chartism, Reform, Movement 

Coolidge, John, Calvin 

Defense Systems, Defense, Country 

Deb, Eugene Victor, American Socialist 

Dollfuss, Engelbert, Chancellor 

Fault, Geology, Line 

Gary, City, Lake County 

Homestead Strike, Labor, Strike 

Pullman Strike, See, Deb 
' Sound, Phenomenon, Sense 

Ueberroth, Peter victor, Sport 
Cluster 2, has 10 hits: 'labor: 9, strike: 8, union:?, labor-management relations 

Cleveland, Grover, 22d 

Industrial Workers, World, Former 

International Ladles, Garment Workers, Union 

Knight, Labor, Union 

Labor Relations, Transaction, Determination 
Lockout, Labor, Relation 
National Labor Relations Act, Nlra, Law 
Labor, Relation, Practice 
Strike, Labor, Relation 
Trade Unions, United States, Labor 
Cluster 3, has 15 hits: 'government: 11, 19B0s:8, war:6, country:4, soviet:3, par 
Colombia, Republic, South America 
France, French Rpublique Franaise, Republic 
Ghana, Country, Africa 
Britain, United Kingdom, Great Britain 
Illinoi, East North Central, U.S. 
Italian Italia, Republic, Europe 
Japan, Japanese Dai, Great 
Northern Ireland, Part, United Kingdom 
Poland, Republic, Polska Rzeczpospolita 
Russian Revolution, Event, Russia 
Spain, Spanish Espaa, Monarchy 
Sweden, Konungariket Sverige, Kingdom 

Union, Soviet Socialist Republics, Russian Soyuz Sovyetskikh Sotsialisticheski 
United States, America, Republic 
World War li, Military, Conflict 



Clustering file utah.txt 
Non-empty clusters: 5 _ 
Clusters: 5 
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0 2 0 (OTHER), state: 2 

13 0 Utah, University, Institution, Utah: 3 

2 9 0 City, Davis County, Utah, city: 8, Utah: 8, mormon: 5, state: 4, name: 

3 3 0 Mormonism, World, Religion, raormonism:3, polygamy:3, bmith:3, tnomi 

4 7 0 Green, River, Utah, Utah: 6, Colorado: 5, mi: 4, km: 4, river: 2, yampa 
Passes: 764, best pass: 515, best score: 0.652, worst score: 0.147 

Cluster 0, has 2 hits: '(OTHER), state:2' 

United States, America, Republic 

State, U.S., North 
Cluster 1, has 3 hits: 'Utah: 3' 

Bushnell, Nolan Kay, Founder-chairman 

Orem, City, Utah County 

Utah, University, Institution 
Cluster 2, has 9 hits: 'city:8, utah:8, mormon:5, state:4, name:3, lake:3, salt 

City, Davis County, Utah 

Deseret, State, Name 

Logan, City, Seat 

Murray, City, Salt Lake County 

Nevada, State, U.S. 

Provo, City, Seat 

Salt Lake City, City, Capital 

Utah, State, U.S. 

Utah Lake, Freshwater, Lake 
Cluster 3, has 3 hits: 'mormonism:3, polygamy:3, smith:3, mormon;3, church:2, ki 

Mormonism, World, Religion 

Smith, Joseph, Religious 

Brigham, Religious, Leader 
Cluster 4, has 7 hits: ' Utah: 6, Colorado: 5, mi:4, km: 4, river: 2, yampa: 2, ute;2, 

Colorado, State, United States 

Colorado, River, North America 

Salt Lake, Body, Salt 

Green, River, Utah 

Hovenweep National Monument, Colorado, Utah 
Uinta Mountains, Range, Mountain 
Ute, North American Indian, Tribe 
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CLAIMS 

I Claim: 

1. A system for case-based organizing and querying of 
a database, said database having a set of objects, said system 
comprising 

means for organizing said database, by examining each 
object in said dataibase and associating that object with a first 
set of property values; 

means responsive to a query, by associating said query 
with a second set of property values and performing matching on 
the objects of the database for objects which are similar. 

2. A system as in claim 1, wherein said objects 
comprise text. 

3. A system as in claim 1, wherein said first set of 
property values comprise keywords or other indicators of content . 

4. A system as in claim 1, wherein said first set of 
property values comprise those words which appear more frequently 
in the document than in the database at large. 

5. A system as in claim 1, wherein said first set of 
property values comprise those words which appear in a 
predetermined section of text of the object. 
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6. A system as in claim 1, wherein said first set of 
property values comprise those words which appear in a title of 
the object. 

7. A system as in claim 1, wherein said matching is 
case-based matching or other fuzzy associative matching. 

8. A system as in claim 1, wherein said query 
comprises text . 

9. A system as in claim 1, wherein said means 
responsive to a query associates said query with keywords or 
other indicators of its content. 

10. A system as in claim 1, comprising means for 
presenting a set of matched objects in response to said query. 

11. A system as in claim 1, comprising means 
responsive to refinement of said query. 

12. A system as in claim 1, comprising means 
responsive to iterative refinement of said query. 

13. A system as in claim 12, wherein said means 
responsive to iterative refinement uses a case-based technique. 
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14. A system as in claim 1, comprising means for 
ordering said set of matched objects in response to quality of 
match. 



15. A system as in claim 1, comprising means for 
organizing said set of matched objects. 

16. A system as in claim 15, wherein said means for 
organizing comprises means for grouping said set of matched 
objects into a set of clusters. 



17. A system as in claim 15, wherein said means for 
organizing comprises means for grouping said set of matched 
objects into a set of clusters of objects which have similar 
properties, which relate to similar content, which have similar 
likelihood to be of relevance to the query, or which have similar 
likelihood to be of interest to an operator posing the query. 

18. A system as in claim 15, comprising means for 
generating suggestions for iterative refinement of said query. 

19. A system as in claim 18, wherein said means for 
generating is responsive to a result of organizing matched 
objects. 
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